Flamenco is a music tradition from Southern Spain which attracts a growing community of enthusiasts around the world. Its unique melodic and rhythmic elements, the typically spontaneous and improvised interpretation and its diversity regarding styles make this still largely undocumented art form a particularly interesting material for musicological studies. In prior works it has already been demonstrated that research on computational analysis of flamenco music, despite it being a relatively new field, can provide powerful tools for the discovery and diffusion of this genre. In this paper we present corpusCOFLA, a data framework for the development of such computational tools. The proposed collection of audio recordings and meta-data serves as a pool for creating annotated subsets which can be used in development and evaluation of algorithms for specific music information retrieval tasks. First, we describe the design criteria for the corpus creation and then provide various examples of subsets drawn from the corpus. We showcase possible research applications in the context of computational study of flamenco music and give perspectives regarding further development of the corpus.
24 pages, submitted to the ACM Journal of Computing and Cultural Heritage